Efficient Model Selection for Regularized Classification by Exploiting Unlabeled Data
نویسندگان
چکیده
Hyper-parameter tuning is a resource-intensive task when optimizing classification models. The commonly used k-fold cross validation can become intractable in large scale settings when a classifier has to learn billions of parameters. At the same time, in real-world, one often encounters multi-class classification scenarios with only a few labeled examples; model selection approaches often offer little improvement in such cases and the default values of learners are used. We propose bounds for classification on accuracy and macro measures (precision, recall, F1) that motivate efficient schemes for model selection and can benefit from the existence of unlabeled data. We demonstrate the advantages of those schemes by comparing them with k-fold cross validation and hold-out estimation in the setting of large scale classification.
منابع مشابه
کاهش ابعاد دادههای ابرطیفی به منظور افزایش جداییپذیری کلاسها و حفظ ساختار داده
Hyperspectral imaging with gathering hundreds spectral bands from the surface of the Earth allows us to separate materials with similar spectrum. Hyperspectral images can be used in many applications such as land chemical and physical parameter estimation, classification, target detection, unmixing, and so on. Among these applications, classification is especially interested. A hyperspectral im...
متن کاملManifold-Regularized Selectable Factor Extraction for Semi-supervised Image Classification
Feature selection methods are efficient in modern computer vision applications to reduce the computational cost and the chance of over-fitting. Recently, a novel selectable factor extraction (SFE[3]) framework is proposed to simultaneously perform feature selection and extraction, and is theoretically and practically proved to be effective for high-dimensional data. Although it is advantageous ...
متن کاملEvaluation and ranking of suppliers with fuzzy DEA and PROMETHEE approach
Supplier selection is a multi-Criteria problem. This study proposes a hybrid model for supporting the suppliers’ selection and ranking. This research is a two-stage model designed to fully rank the suppliers where each supplier has multiple Inputs and Outputs. First, the supplier evaluation problem is formulated by Data Envelopment Analysis (DEA), since the regarded decision deals with uncertai...
متن کاملExploiting Ontology Structures and Unlabeled Data for Learning
We present and analyze a theoretical model designed to understand and explain the effectiveness of ontologies for learning multiple related tasks from primarily unlabeled data. We present both information-theoretic results as well as efficient algorithms. We show in this model that an ontology, which specifies the relationships between multiple outputs, in some cases is sufficient to completely...
متن کاملSemi-supervised manifold learning approaches for spoken term verification
In this paper, the application of semi-supervised manifold learning techniques to the task of verifying hypothesized occurrences of spoken terms is investigated. These techniques are applied in a two stage spoken term detection framework where ASR lattices are first generated using a large vocabulary ASR system and hypothesized occurrences of spoken query terms in the lattices are verified in a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015